{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Roboschool simulations training with stable baselines on Amazon SageMaker RL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introductions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Roboschool is an [open source](https://github.com/openai/roboschool/tree/master/roboschool) physics simulator that is commonly used to train RL policies for robotic systems.  Roboschool defines a [variety](https://github.com/openai/roboschool/blob/master/roboschool/__init__.py) of Gym environments that correspond to different robotics problems. One of them is **HalfCheetah** which is a two-legged robot, restricted to a vertical plane, meaning it can only run forward or backward.\n",
    "\n",
    "In this notebook example, we will make **HalfCheetah** learn to walk using the [stable-baselines](https://stable-baselines.readthedocs.io/en/master/) a set of improved implementations of Reinforcement Learning (RL) algorithms based on [OpenAI Baselines](https://github.com/openai/baselines). \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "roboschool_problem = 'half-cheetah'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-requisites \n",
    "\n",
    "### Imports\n",
    "\n",
    "To get started, we'll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import boto3\n",
    "import sys\n",
    "import os\n",
    "import subprocess\n",
    "from IPython.display import HTML\n",
    "import time\n",
    "from time import gmtime, strftime\n",
    "sys.path.append(\"common\")\n",
    "from misc import get_execution_role, wait_for_s3_object\n",
    "from docker_utils import build_and_push_docker_image\n",
    "from sagemaker.rl import RLEstimator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup S3 bucket\n",
    "\n",
    "Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sage_session = sagemaker.session.Session()\n",
    "s3_bucket = sage_session.default_bucket()  \n",
    "s3_output_path = 's3://{}/'.format(s3_bucket)\n",
    "print(\"S3 bucket path: {}\".format(s3_output_path))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Variables \n",
    "\n",
    "We define variables such as the job prefix for the training jobs *and the image path for the container (only when this is BYOC).*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a descriptive job name \n",
    "job_name_prefix = 'rl-roboschool-'+roboschool_problem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configure where training happens\n",
    "\n",
    "You can train your RL training jobs using the SageMaker notebook instance or local notebook instance. In both of these scenarios, you can run the following in either local or SageMaker modes. The local mode uses the SageMaker Python SDK to run your code in a local container before deploying to SageMaker. This can speed up iterative testing and debugging while using the same familiar Python SDK interface. You just need to set `local_mode = True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# run in local_mode on this machine, or as a SageMaker TrainingJob?\n",
    "local_mode = False\n",
    "\n",
    "if local_mode:\n",
    "    instance_type = 'local'\n",
    "else:\n",
    "    # If on SageMaker, pick the instance type\n",
    "    instance_type = \"ml.c4.xlarge\"\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create an IAM role\n",
    "\n",
    "Either get the execution role when running from a SageMaker notebook instance `role = sagemaker.get_execution_role()` or, when running from local notebook instance, use utils method `role = get_execution_role()` to create an execution role."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    role = sagemaker.get_execution_role()\n",
    "except:\n",
    "    role = get_execution_role()\n",
    "print(\"Using IAM role arn: {}\".format(role))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install docker for `local` mode\n",
    "\n",
    "In order to work in `local` mode, you need to have docker installed. When running from you local machine, please make sure that you have docker and docker-compose (for local CPU machines) and nvidia-docker (for local GPU machines) installed. Alternatively, when running from a SageMaker notebook instance, you can simply run the following script to install dependenceis.\n",
    "\n",
    "Note, you can only run a single local notebook at one time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# only run from SageMaker notebook instance\n",
    "if local_mode:\n",
    "    !/bin/bash ./common/setup.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build docker container\n",
    "\n",
    "We must build a custom docker container with Roboschool installed.  This takes care of everything:\n",
    "\n",
    "1. Fetching base container image\n",
    "2. Installing Roboschool and its dependencies\n",
    "3. Installing stable-baselines and its dependencies such as OpenMPI, etc.\n",
    "3. Uploading the new container image to ECR\n",
    "\n",
    "This step can take a long time if you are running on a machine with a slow internet connection.  If your notebook instance is in SageMaker or EC2 it should take 3-10 minutes depending on the instance type.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "cpu_or_gpu = 'gpu' if instance_type.startswith('ml.p') else 'cpu'\n",
    "repository_short_name = \"sagemaker-roboschool-stablebaselines-%s\" % cpu_or_gpu\n",
    "docker_build_args = { \n",
    "    'AWS_REGION': boto3.Session().region_name,\n",
    "}\n",
    "custom_image_name = build_and_push_docker_image(repository_short_name, build_args=docker_build_args)\n",
    "print(\"Using ECR image %s\" % custom_image_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Write the Training Code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Configure the presets for RL algorithm\n",
    "\n",
    "The presets that configure the RL training jobs are defined in the `preset-half-cheetah.py` in the `./src` directory. Using the preset file, you can define agent parameters to select the specific agent algorithm. You can also set the environment parameters, define the schedule and visualization parameters, and define the graph manager. The schedule presets will define following hyper-parameters for PPO1 training:\n",
    "* `num_timesteps`: (int) Number of training steps - Preset: 1e4\n",
    "* `timesteps_per_actorbatch` – (int) timesteps per actor per update - Preset: 2048\n",
    "* `clip_param` – (float) clipping parameter epsilon - Preset: 0.2\n",
    "* `entcoeff` – (float) the entropy loss weight - Preset: 0.0\n",
    "* `optim_epochs` – (float) the optimizer’s number of epochs - Preset: 10\n",
    "* `optim_stepsize` – (float) the optimizer’s stepsize - Preset: 3e-4\n",
    "* `optim_batchsize` – (int) the optimizer’s the batch size - Preset: 64\n",
    "* `gamma` – (float) discount factor - Preset: 0.99\n",
    "* `lam` – (float) advantage estimation - Preset: 0.95\n",
    "* `schedule` – (str) The type of scheduler for the learning rate update (‘linear’, ‘constant’, ‘double_linear_con’, ‘middle_drop’ or ‘double_middle_drop’) - Preset: linear\n",
    "* `verbose` – (int) the verbosity level: 0 none, 1 training information, 2 tensorflow debug - Preset: 1\n",
    "\n",
    "You can refer the complete list of args and documentation for PPO1 algorithm here: https://stable-baselines.readthedocs.io/en/master/modules/ppo1.html\n",
    "\n",
    "These can be overridden at runtime by specifying the RLSTABLEBASELINES_PRESET hyperparameter. Additionally, it can be used to define custom hyperparameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pygmentize src/preset-{roboschool_problem}.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Write the Training Code\n",
    "\n",
    "The training code is in the file `train-coach.py` which is also the `./src` directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pygmentize src/train_stable_baselines.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train the RL model using the Python SDK Script mode\n",
    "\n",
    "If you are using local mode, the training will run on the notebook instance. When using SageMaker for training, you can select a GPU or CPU instance. The RLEstimator is used for training RL jobs. \n",
    "\n",
    "1. Specify the source directory where the environment, presets and training code is uploaded.\n",
    "2. Specify the entry point as the training code \n",
    "3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container. \n",
    "4. Define the training parameters such as the instance count, job name, S3 path for output and job name. \n",
    "5. Specify the hyperparameters for the RL agent algorithm. The `RLSTABLEBASELINES_PRESET` can be used to specify the RL agent algorithm you want to use. \n",
    "6. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks. \n",
    "\n",
    "Please note all the configured preset parameters in `preset-half-cheetah.py` can be overriden by specifying the overriden value in `hyperparameters` block.\n",
    "\n",
    "**Note**: For MPI based jobs, local mode is only supported for single instance jobs. Please use `instance_type` as `1` if using local mode."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "estimator = RLEstimator(entry_point=\"train_stable_baselines.py\",\n",
    "                        source_dir='src',\n",
    "                        dependencies=[\"common/sagemaker_rl\"],\n",
    "                        image_name=custom_image_name,\n",
    "                        role=role,\n",
    "                        train_instance_type=instance_type,\n",
    "                        train_instance_count=2,\n",
    "                        output_path=s3_output_path,\n",
    "                        base_job_name=job_name_prefix,\n",
    "                        hyperparameters={\n",
    "                            \"RLSTABLEBASELINES_PRESET\":\"preset-{}.py\".format(roboschool_problem),\n",
    "                            \"num_timesteps\":1e4,\n",
    "                            \"instance_type\":instance_type\n",
    "                        },\n",
    "                        metric_definitions= [\n",
    "                            {\n",
    "                                \"Name\":\"EpisodesLengthMean\",\n",
    "                                \"Regex\":\"\\[.*,.*\\]\\<stdout\\>\\:\\| *EpLenMean *\\| *([-+]?[0-9]*\\.?[0-9]*) *\\|\"\n",
    "                            },\n",
    "                            {\n",
    "                                \"Name\":\"EpisodesRewardMean\",\n",
    "                                \"Regex\":\"\\[.*,.*\\]\\<stdout\\>\\:\\| *EpRewMean *\\| *([-+]?[0-9]*\\.?[0-9]*) *\\|\"\n",
    "                            },\n",
    "                            {\n",
    "                                \"Name\":\"EpisodesSoFar\",\n",
    "                                \"Regex\":\"\\[.*,.*\\]\\<stdout\\>\\:\\| *EpisodesSoFar *\\| *([-+]?[0-9]*\\.?[0-9]*) *\\|\"\n",
    "                            }\n",
    "                        ]\n",
    "                    )\n",
    "\n",
    "estimator.fit(wait=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualization\n",
    "\n",
    "RL training can take a long time.  So while it's running there are a variety of ways we can track progress of the running training job.  Some intermediate output gets saved to S3 during training, so we'll set up to capture that."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fetch videos of training rollouts\n",
    "Videos of certain rollouts get written to S3 during training.  Here we fetch all that are available, and \n",
    "render the last one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "job_name = estimator.latest_training_job.job_name\n",
    "print(\"Training job: %s\" % job_name)\n",
    "\n",
    "s3_url = \"s3://{}/{}\".format(s3_bucket,job_name)\n",
    "\n",
    "if local_mode:\n",
    "    output_tar_key = \"{}/output.tar.gz\".format(job_name)\n",
    "else:\n",
    "    output_tar_key = \"{}/output/output.tar.gz\".format(job_name)\n",
    "\n",
    "intermediate_folder_key = \"{}/output/intermediate\".format(job_name)\n",
    "output_url = \"s3://{}/{}\".format(s3_bucket, output_tar_key)\n",
    "intermediate_url = \"s3://{}/{}\".format(s3_bucket, intermediate_folder_key)\n",
    "\n",
    "print(\"S3 job path: {}\".format(s3_url))\n",
    "print(\"Output.tar.gz location: {}\".format(output_url))\n",
    "print(\"Intermediate folder path: {}\".format(intermediate_url))\n",
    "    \n",
    "tmp_dir = \"/tmp/{}\".format(job_name)\n",
    "os.system(\"mkdir {}\".format(tmp_dir))\n",
    "print(\"Create local folder {}\".format(tmp_dir))\n",
    "wait_for_s3_object(s3_bucket, intermediate_folder_key, tmp_dir) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### RL output video"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import io\n",
    "import base64\n",
    "video = io.open(\"{}/rl_out.mp4\".format(tmp_dir), 'r+b').read()\n",
    "encoded = base64.b64encode(video)\n",
    "HTML(data='''<video alt=\"test\" controls>\n",
    "                <source src=\"data:video/mp4;base64,{0}\" type=\"video/mp4\" />\n",
    "             </video>'''.format(encoded.decode('ascii')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example of trained walking HalfCheetah\n",
    "\n",
    "This is the output of the training job triggered bu above code, with following additional configurations:\n",
    "* `train_instance_count`: 10\n",
    "* `train_instance_type`: ml.c4.xlarge\n",
    "* `num_timesteps`: 1e7\n",
    "\n",
    "It took 40 min to train the model with the above settings. You can have similar output with lesser instances and more training duration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import io\n",
    "import base64\n",
    "video = io.open(\"examples/robo_half_cheetah_10x_40min.mp4\", 'r+b').read()\n",
    "encoded = base64.b64encode(video)\n",
    "HTML(data='''<video alt=\"test\" controls>\n",
    "                <source src=\"data:video/mp4;base64,{0}\" type=\"video/mp4\" />\n",
    "             </video>'''.format(encoded.decode('ascii')))"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "conda_tensorflow_p36",
   "language": "python",
   "name": "conda_tensorflow_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
